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Abstract 

Sesame is an important oil crop for the high oil content and quality. The seed oil and protein contents are two important 
traits in sesame. To identify the molecular markers associated with the seed oil and protein contents in sesame, we 
systematically performed the association mapping among 369 worldwide germplasm accessions under 5 environments 
using 112 polymorphic SSR markers. The general linear model (GLM) was applied with the criteria of logP>3.0 and high 
stability under all 5 environments. Among the 369 sesame accessions, the oil content ranged from 27.89%-58.73% and the 
protein content ranged from 16.72%-27.79%. A significant negative correlation of the oil content with the protein content 
was found in the population. A total of 19 markers for oil content were detected with a value range from 4% to 29%; 24 
markers for protein content were detected with a R^ value range from 3% to 29%, of which 19 markers were associated with 
both traits. Moreover, partial markers were confirmed using mixed linear model (MLM) method, which suggested that the 
oil and protein contents are controlled mostly by major genes. Allele effect analysis showed that the allele associated with 
high oil content was always associated with low protein content, and vice versa. Of the 19 markers associated with oil 
content, 17 presented near the locations of the plant lipid pathway genes and 2 were located just next to a fatty acid 
elongation gene and a gene encoding Stearoyl-ACP Desaturase, respectively. The findings provided a valuable foundation 
for oil synthesis gene identification and molecular marker assistant selection (MAS) breeding in sesame. 
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introduction 

Sesame {Sesamum indicum L.) is an ancient and important 
oilseed crop and is cultivated mainly in the tropical and 
subtropical regions of Asia, Africa and Southern America. The 
harvested area of world sesame reaches to 7.3 xlO' hm^, and the 
total product per year is roughly 3.7 xlO' ton from 2001 to 2010 
(FAO data). Compared with other main oil crops, e.g., soybean 
(18% of average oil content) [1], oilseed rape (41%) [2], sunflower 
(40-44%) [3] and peanut (51%) [4], sesame is one of the few oU 
crops with the highest oil content and quality. Sesame seeds 
contain 55-58% oil and almost 18% proteins. Among the fatty 
acid compositions in sesame seeds, oleic acid (18:1) (39.6%) and 
linoleic acid (18:2) (46.0%) are the two main components with the 
ideal ratio of almost 1:1 [5,6]. Apart from the seed yield, the 
content of seed storage oil and proteins is the highlight agronomic 
trait in sesame breeding [7]. 

In the past two decades, in order to clarify the high quality of 
sesame oil and protein, many researchers focused on exploring 
seed development and fatty acid and storage protein synthesis 
processes, as well as identifying the lipid synthesis related genes 
and molecular markers in sesame [8—12]. Of all the three available 
cDNA libraries, two libraries are constructed for seed development 
analysis [13,14]. However, oil and protein contents are complex 
quantitative traits and always are affected by genotype and 
environment [15]. At present, the mechanism of high oil content 
and quality in sesame seeds is stUl unclear. No loci of oil and 



protein content traits have ever been found in the sesame linkage 
maps. Even though Wei et al. [12] performed the association 
analysis of seed oil and protein content and fatty acid composition 
within 216 Chinese sesame accessions using 79 molecular primer 
pairs (including SSRs, SRAPs and AFLPs), only one association 
marker (M15E10-3) was identified under two environments. 
Therefore, in order to precisely detect the genes or markers 
associated with oil and protein content traits and to improve the 
sesame breeding, more efficient markers and germplasm resources 
with larger phenotypic variation need to be applied [16,17]. 

Currently, linkage analysis (QTL mapping) and association 
mapping are two main and common analysis tools for dissecting 
complex phenotypic variation. Compared with the traditional 
linkage analysis based on mapping populations, association 
mapping offers higher precision for locating QTLs and selecting 
molecular markers [16,18]. Till now, association mapping has 
been extensively used for analyzing important agronomic and 
quantity traits in wheat, maize, cotton, oilseed rape and other 
crops [18-22]. In the past several years, vast simple sequence 
repeat (SSR) or microsatellite markers with the high polymor- 
phism are developed in sesame [7,23,24]. Accordingly, the 
association mapping is getting reliable and powerful for detecting 
the genes or markers associated with key traits and improving the 
molecular marker-assisted selection (MAS) in sesame breeding 
programs. 
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The objectives of this study are: (I) to perform the association 
mapping of seed oil content (OC) and protein content (PC) traits in 
worldwide sesame accessions using the GLM and MLM models, 
(2) to reflect the characteristics of sesame oil and protein contents 
under various environments, and (3) to determine the key SSR 
markers associated with seed quality. In this report, a natural 
population covering 369 worldwide accessions from China and 
other 1 5 countries and 1 1 2 pairs of polymorphic SSR markers 
were applied. The results give a foundation for investigating the 
seed development-related genes and seed quality in sesame. 
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Results 



Seed oil and protein content variation in the natural 
population 

Both the seed oil content (OC) and the protein content (PC) are 
often influenced by environment. To decrease the environmental 
effect, we collected the phenotypic data of the 369 sesame 
accessions under 5 environments of Pingyu and Yuanyang 
locations in 20 11 , and Pingyu, Yuanyang and Xinyang locations 
in 2012, The descriptive parameters under each environment were 
calculated (Table 1). The results showed that the OC and PC 
significantiy varied among the 369 accessions. In total, the OC of 
the natural population ranged from 27.89%-58.73%, with an 
average of 49.59%-53.14%; meanwhile, the PC ranged from 
16.72%-27.79% with an average of 20.28%-22.51%. All the 
datasets showed a normal or nearly normal distribution. To 
determine the heritabihty of the phenotypes, we performed the 
variance analysis of oil and protein contents (Table 2). Results 
indicated that the OC and PC traits were significandy influenced 
by genotype and environments (i.e., year and location). No 
significant interactions between variety and environmental factors 
(year and location) were detected. Moreover, the OC and PC traits 
presented the significant negative correlation under the 5 
environments, as the correlation coefficient (r) between OC and 

PC varied from -0.66 0.72 (P<0.01) in 2011 and from 

-0.52 0.74 (P<0.01) in 2012, respectively (Data not listed). 
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Linkage disequilibrium 

Linkage disequilibrium (LD) refers to the non-random associ- 
ation of alleles between the genetic loci. A total of 1 1 2 SSR 
markers were used for estimating the LD level among the 369 
sesame germplasm accessions (Table S2). These SSR markers 
distributed in 33 contigs/scaffolds with a total length of 180.86 
Mb, which approximately represented 67 percentage of the 
assembly genome size (270 Mb) and 50 percentage of the 
estimated genome size (360Mb). The average SSR density was 1 
SSR per 1.6 Mb. To reflect the associations between the 
polymorphic loci of the 1 1 2 SSR markers, LD P-values were 
determined among the 6,216 locus pairs (i.e., 1 12*(1 12-l)/2) using 
Fisher's exact test and two indexes of Z)' and (Figure 1). The 
average values off and for the 6,216 pairs were 0.1649 and 
0.0173, respectively. Of the 6216 pairs, 2584 pairs (41.57%) 
showed a significant linkage disequilibrium (P<0.01), 363 pairs 
(5.84%) showed a higher linkage disequilibrium of Z)'>0.5, and 33 
pairs (0.5%) gave a D' value of 1.0 (i.e., complete linkage). The 
data indicated that linkage disequilibrium existed among the 



sesame accessions. 



SSR marker diversity and population structure 

Before analyzing the association, the polymorphism of the 1 1 2 
SSR markers within the 369 germplasms was investigated (Table 
S2). Results showed that the number of alleles ranged from 2-5, 
with an average of 2.47 per locus. The PIC values of the markers 
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Table 2. Analysis of Variance of the oil and protein contents in the population under 5 environments. 





Source of variation 


DF 


OC 




PC 








MS 


F value 


MS 


F value 


Year 


1 


27.33 


7.46** 


211.90 


412.83** 


Location 


2 


1564.58 


427.02** 


551.56 


1074.55** 


Variety 


368 


62.78 


17.13** 


5.23 


10.19** 


Year X Variety 


368 


3.19 


0.87 


0.55 


1.07 


Location xVariety 


736 


2.94 


0.80 


0.65 


1.27 


Residual 


367 


3.66 




0.51 





** The significance at P<0.01. 

DF denotes degree of freedom; MS denotes mean square. 
doi:10.1371/journal.pone.0105757.t002 



ranged from 0.028 (Hs373) to 0.669 (Y1994) with an average value 
of 0.302. The percentage of heterozygotes per marker varied from 
0.27% (Hs373) to 23.12% (Hs4325), with an average of 10.14%. 
The data indicated that the natural population had the high 
heterozygosity and was suitable for association mapping analysis. 
Subsequently, we estimated the population structure, as admixture 
of subpopulation could result in LD and produce the false-positive 



results. The most likely number {K) of subgroups in the 369 
germplasm accessions was estimated using the 1 1 2 SSR markers 
(Figure 2). As K values increased from 1 to 10, the value of LrLP(D) 
elevated directly; meanwhile, the AK reduced straightly with a 
clear peak value of 643.5 at K=2. The results indicated that the 
population was roughly composed of two divergent subgroups 
according to the Bayesian posterior probability analysis (Figure 3). 




Figure 1 . Disequilibrium matrix of 11 2 SSR polymorphic sites with both the X-axis and Y-axis. The matrix is divided into two regions by 
the diagonal line. The upper right region indicates the D' value of each SSR couples, and the corresponding blocks in the lower left region indicates 
the significance of D'. Various value intervals of the D' or P values are shown in different colors according to the right color columns. 
doi:1 0.1 371/journal.pone.01 05757.g001 



PLOS ONE I www.plosone.org 



3 



August 2014 | Volume 9 | Issue 8 | e105757 



Association IVIapping in Sesame 



Of the 369 accessions, 126 accessions were grouped into subgroup 
1 (the green ones in Figure 3), 243 accessions into subgroup 2 (tlie 
red ones in Figure 3). Most (47 out of 51) of the foreign lines were 
located in subgroup 2. The two subgroups in the population were 
considered having the complex 'admixture' relationship, and no 
significant correlation of geographic origin with the subgroups in 
the Chinese lines were found. 

Marker-trait associations 

In this study, the association analysis was performed using 
general liner model (GLM) method. The stringent criterion of 
logPs3.0 under 5 environments was used for determining the 
association significance between the OC and PC traits. For the 
OC trait, 1 9 markers were detected and the values ranged from 
4%-29% (Table 3 & Table S2). According to the primer locations 
in the sesame genome, 19 markers were mapped in 11 contigs/ 
scaffolds. Of the 19 markers, Hs485 and Hs586 are located in 
scaflFoldOOOOl, Hs4381, Hs02 and Hs4082 are located in COl, and 
Hs4061, Hs345 and Hs 19563 are in C04. The distance between 
the markers that located in the same contig/scaffold was short 
than 2 Mb (Table S2). For the PC trait, 24 markers were detected 
with the value range of 3%-29% and mapped in 15 contigs/ 
scaffolds (Table 4 & Table S2). Comparison results indicated that 
19 of the above 24 markers were associated with OC trait at the 
same time; the other 5 markers of Hs425 (in C08), Hs560 (in CI 1), 
Hs4265 (in CI 2), and Hs4089 and Hsl514 (in CI 9) were unique 
to PC trait. Simultaneously, in order to assay the stability, we 
performed association mapping using mixed liner model (MLM) 
method, with the criterion of logP&3.0 under at least 3 
environments. 

For the OC trait, 9 markers were detected in MLM model and 
8 markers of Hs345, Hs4381, Hs485, Hsl036, Hs4061, Hs635, 
Hs376 and Hs586 were confumed using GLM and MLM 
methods. Especially, 4 markers of Hs345, Hs4381, Hs485 and 
Hsl036 had high R^ values (&10%) under 5 environments 
(Table 3 and Table S3). For the PC trait, 9 markers were found 
and confirmed in both models, of which 7 markers had high R^ 
values (^10%) under 5 environments (Table 4 and Table S3). 
These results suggested that the OC and PC traits are controlled 
mostly by major genes in sesame. 

Marker effect on the phenotypic variation 

To explore the association between the above markers and 
phenotypic variation and the utility potential in sesame MAS 
breeding program, we performed the allelic effects of the five 
markers associated with both traits (Table 5). For each marker, the 



effects estimated were in accordance with the allele character 
under the 5 environments. Hs345 had the largest effect on the 
variation of seed oil and protein content. The Hs345-l:l and 
Hs345-2:2 of Hs345 showed the different variation effects on OC 
trait, as the average oil content in the accessions ranged from 
52.05%-42.82%. In the genotypes carrying the Hs4381-l:l aUele, 
oil and protein contents were 43.89% and 23.25%, respectively, 
while the samples with the Hs485-2:2 contained 52.00% oil and 
21.00% proteins. Furthermore, the specific allele indicated the 
negative or positive effect to a large extent on the OC or PC trait. 
For example, the allele effect of Hs345-2:2 on OC trait ranged 
from —8.13 to — 12.18, whereas the effect on PC trait ranged from 
2.05 to 3.82. Comparison results suggested that the alleles of all 5 
markers give the opposite effects on OC and PC traits, 
respectively. The allele that increased the oil content certainly 
gave the negative effect on protein content, and vice versa. 
Therefore, these markers could be used for screening sesame lines 
with either high oil or protein content. 

Comparative genome analysis 

To clarify the distribution and more information of the above 
associated markers, we performed the comparative genome 
analysis of the 19 SSR markers associated with the OC trait 
(Table 6). AH genes closest to the markers were explored. Of the 
1 9 markers, 1 1 are located in gene regions and 8 are in intergenic 
regions. The flanking genes had various functions, such as ligase 
(CO 1.560), transcription factor (C04.26, CI 3.438) and kinase 
(C 14.22). Moreover, we found that the markers of Hs4082 and 
Hs345 were just located next to CO 1.883 (ABC transporter G 
family member 3 gene) and C04.786 (Stearoyl-ACP Desaturase 
gene), respectively, which were proved involved in plant lipid 
biosynthesis. We also screened the upstream and downstream 
sequences of 500 Kb far from each marker. 17 (out of 19) markers 
were proved close to plant lipid pathway genes. These result 
further confirmed our association mapping conclusions. 

Discussion 

To clarify the genetic mechanisms of fatty acid and protein 
synthesis in sesame seeds, we herein performed the association 
mapping analysis of the OC and PC traits among 369 sesame 
accessions using 112 genic-SSR markers. These accessions were 
collected from 1 9 provinces of China and 1 5 other countries, and 
represented the genetic diversity of sesame germplasm for 
association mapping study. These accessions included many 
released Chinese and foreign cultivars. Compared with the 
traditional linkage analysis (QTL mapping), the association 
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Figure 2. Estimated (a) Ln/^D) and (b) A/C values for a given K. 
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Figure 3. Genetic composition of individuals based on Bayesian posterior probability. According to the Bayesian posterior probability, 
the natural population of 369 worldwide sesame lines is divided into two groups in green and red colors, respectively. 
doi:1 0.1 371 /journal.pone.01 05757.g003 



analysis based on linkage disequilibrium (LD) has been applied for 
the quantitative trait loci (QTLs) detection and location in many 
crops. Meanwhile, GLM and MLM models are applied individ- 
ually for evaluating the marker association. In this article, 19 SSR 
markers associated with the OC trait were detected under each 5 
environments in GLM model, while 24 markers were determined 
and associated with the protein content. 

Sesame genetic diversity and the population structure 

Sesame is a diploid and self-poUinated oil crop with the 
karyotype of 2n = 2x = 26. As all cultivars are originated from the 
sole cultivated species, Sesamum indicum L, the narrow genetic 
diversity presents in regional sesame resources to a large extent 
[24,49-51]. In addition, many reports reflect that there is no clear 
association between genotype and geographical origin, as many 
sesame accessions from the different geographic locations are 
clustered into the same group in the dendrogram [51-53]. Apart 
from the natural history of predomesticated ancestors, the diversity 
pattern of domestic species could be influenced by the breeding 
practice, germplasm collection and human activity [53-55]. In this 
article, in order to guarantee the broad genetic variation, we 
selected the 369 worldwide accessions for seed nutrition genetic 
analysis according to the geographical origin, molecular clustering 
and the morphologic diversity (Table SI). The heterozygosity 
ranged from 0.27% (Hs373)-23.12% (Hs4325). The result proved 
that the natural population could be used as the core germplasm 
for association mapping (Table S2). Population structure analysis 
showed that many sesame accessions collected from the same 
geographic locations were not grouped together, which further 
proved the unclear association between genotype and geograph- 
ical origin in sesame germplasm resource [51-53]. 

During performing the association mapping in a population, LD 
patterns between the functional loci and markers should be 
analyzed at first [56]. We analyzed the P-values of linkage 
disequilibrium between the polymorphisms of the 1 12 SSR marker 
loci using Fisher's exact test (Figure 1). As 363 (5.84%) pairwise 
comparisons had the high LDs (Z)'>0.5), the linkage disequilib- 
rium existed in 369 sesame accessions. Accordingly, we believed 
that the natural population is suitable for association analysis of the 
oil and protein contents. 

Oil and protein content variation and associated SSR 
markers 

Among large-scale sesame germplasm resources, the oil and 
protein contents varied significantly. Yermanos et al. [57] 
evaluated 721 sesame samples collected from more than 19 
countries, and found that the oil content varied from 40.4—59.8% 
with the mean of 53.1%. The protein content ranged from 19- 
31% with an average of about 25% [58]. In our population, the oil 
content varied from 27.89%-58.73% with an average of 51.34%, 



while the protein content varied from 16.72%-27.79% with an 
average of 21.19% (Table 1). The data reflected the great 
variation of sesame seed compositions in germplasm [57,59,60]. 

Comparison analysis proved that there is a strong negative 
correlation of the oil content with protein content. Interestingly, 
the stringency relationship was also exhibited in the association 
analysis. As shown in the GLM analysis in Table 3 and 5, aU the 
1 9 markers associated with OC were detected for PC trait; and the 
aUeles exhibited the opposite effects on OC and PC at the same 
time. These phenomena also present in other oil crops, such as 
oilseed rape, cotton, soybean and peanut [1,61-63]. Zhao et al. 
[62] found 6 QTLs with pleiotropic effects on both oil content and 
protein content in oilseed rape. In the cotton backcross inbred 
population, of 1 7 QTLs for oil content and 20 QTLs for protein 
content, 8 QTLs co-localized in the same chromosome regions 
and controUed oil and protein contents with opposite additive 
effects [63]. Hwang et al. (2014) detected 25 SNPs associated with 
seed oil in 13 different genomic regions through GWAS (genome 
wide association study), and 7 SNPs were significandy associated 
with both protein and oil traits. For the six of seven marker loci, a 
negative relationship existed between the protein effect versus that 
on oil [64]. Meanwhile, QTLs or markers associated with 
increased protein and oil contents were also found. For example, 
Zhao et al. (2006) found that 2 QTLs that controlled oil content 
were independent from protein content by conditional QTL 
mapping [65]; Hwang et al. (2014) found a SNP at the 4.92 Mb 
position on Chr 9 was associated with increased protein and oil 
contents [64]. 

In many crops, the seed oil content seems to be controlled 
mosdy by major genes [66-69]. In this study, we detected 19 
markers associated with OC trait in sesame using GLM method, 
and the values ranged from 4.0-29.0%. Chen et al. (2010) 
screened 27 QTLs related to oil content in oilseed rape and the 
individual explanation was high with the range of 4.20-30.20% 
[66]. In Arahidopsis thaliana, a single QTL or marker could give 
an explanation of 17-19% for the seed oil content [68]. In 
soybean, the explanation reached to 14.3-45.6% [69]. In this 
report, 1 1 2 polymorphic SSR markers were used for association 
mapping. Compared with other common molecular markers, such 
as SRAP and AFLP, SSR marker is more suitable for sesame 
diversity analysis due to the narrow genetic basis [24,49-51]. The 
marker distribution and density were analyzed using the sesame 
genome assembly data. The contigs/scaffolds carrying the 112 
markers approximately covered 180 Mb of the sesame genome 
and occupied ~67% of the assembly size (270 Mb) and 50% of the 
estimated genome size (~360 Mb, in which 90 Mb was believed 
to be repeat sequences) (Table S2) [70,71]. Therefore, the 
association mapping using 1 1 2 markers is meaningful and reliable, 
even though some QTLs might be missed. To detect more QTLs, 
new SSR or SNP markers could be applied in further research. 
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Table 6. Information of candidate genes related to the markers associated with OC traits. 





Marker 


Related gene' 


Annotation 


Nearby lipid genes'' 


Arabidopsis thaliana homologous genes 


Ref. 










Loci^ 


Annotstion 




Hs4381 


CO 1.460 


Thioredoxin M3 


C01.526 


AtlG55260 


Lipid Transfer Protein type 5 


[25] 


Hs02 


C01.560 


E3 ubiquitin-protei 


C01 .548 


At3G25110 


Acyl-ACP Thioesterase A 


[26] 






ligase RHF2A 


C01.575 


At3G07400 


Lipid Acylhydrolase-like 






C01.561 


Unknown 


C01 .601 


At2G30720 


Acyl-CoA Thioesterase 


[27] 


Hs4082 


C01.882 


Cell cycle protein 


C01 .873 


AtlG53390 


ABC Transporter 


[28] 




CO 1.883 


ABC transporter G 


CO 1.883 


At2G28070 


ABC Transporter 


[28] 






family member 3 


C01 .928 


At3G44830 


Diacylglycerol Acyltransferase 


[29] 


Hs635 


C02.696 


RAS-related protein RabllA 


C02.739 


AtlG47620 


Midchain Alkane Hydroxylase 


30] 


Hs4209 


C04.25 


Transmembrane domain 


C04.38 


AtlG31770 


ABC Transporter 


[31] 






containing protein 


C04.56 


AtlG71010 


Phosphatidylinositol-Phosphate 
Kinase 


[32] 




C04.26 


TCP transcription factor 12 


C04.81 


AtlG10900 


Phosphatidylinositol-Phosphate 
Kinase 


[32] 








C04.96 


AtlG77660 


Phosphatidylinositol-Phosphate 
Kinase 


[32] 


Hs40ei 


C04.681 


Inositol trisphosphate 5- 
phosphatase 1 


- 








Hs345 


C04.785 


Unknown 


C04.767 


AtlG05630 


Phosphoinositide 5-Phosphatase 


— 




C04.786 


Stearoyl-ACP desaturase 


C04.786 


At2G43710 


Stearoyl-ACP Desaturase 


[33] 


Hs195e 


C04.845 


PARI -related sequence 


The same as Hs345 








Hs393 


CI 3.438 


bZIP transcription 


CI 3.388 


AtlG10900 


Phosphatidylinositol-Phosphate 
Kinase 


[32] 






factor 40 


CI 3.471 


At5G10160 


Hydroxyacyl-ACP Dehydrase 


[34] 








CI 3.504 


AtlG17840 


ABC Transporter 


[35] 








C13.514 


At4G36480 


Subunit of Serine 
Palmitoyltransferase 


[36] 


Hs205 


CI 4.21 


Thylakoid membrane 


C14.49 


At4G33550 


Lipid Transfer Protein type 3 


[37] 






phosphoprotein 


C14.66 


AtlG49430 


Long-Chain Acyl-CoA Synthetase 


[38] 




CI 4.22 


Serine/threonine-protein 


C14.111 


At2G46210 


Sphingobase-D8 Desaturase 


[39] 






kinase WNK4 


C14.132 


At2G46090 


Diacylglycerol Kinase 




Hs672 


CI 4.367 


Cyclin-P3.1 F-box 


C14.359 


At2G45150 


CDP-DAG Synthase 


[40] 






protein 


C14.413 


At4G33355 


Lipid Transfer Protein type 1 






CI 4.368 


Unknown 


C14.428 


At2G44810 


Acylhydrolase (DADI-like) 




Y2129 


CI 5.825 


Unknown 


CI 5.92 


At2G29980 


Linoleate Desaturase 


[41] 


HS1638 


CI 5.840 


Unknown 


The same as Y2129 








Hs618 


C25.41 


lAA-alanine resistance 


C25.69 


At5G08415 


Lipoate Synthase 


[42] 


protein 1 




C25.42 


Beta-D-xylosidase 










Hs376 


C26.474 


ADP-ribosylation factor 


C26.417 


AtlG15110 


Base- Exchange-type 
Phosphatidylserine 


[43] 


Synthase 








C26.454 


AtlG71010 


Phosphatidylinositol-Phosphate 
Kinase 










C26.515 


AtlG31812 


Acyl CoA Binding Protein 


[44] 


Hs377 


C26.475 


Unknown 


The same as Hs376 








Hs485 


sfOOOOl.885 


Unknown 


sfOOOOl.95 


At4G 04930 


Dihydrosphingosine Delta-4 
Desaturase 


[45] 




sfOOOOl.886 


Unknown 










Hs58e 


sfOOOOl.754 


ADP-ribosylation factor 










HS1036 


sf00044.12 


ABC transporter 1 


sf00044.12 


At3G20320 


Acid-Binding Protein 


[46] 






family member 15 


sf00044.41 


At2G25170 


Chromatin remodeling factor 


[47] 
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Table 6. Cont. 





Marker 


Related gene° 


Annotation 


Nearby lipid genes'' 


Arabidopsis thaliana homologous genes 


Ref. 










Loci' 


Annotation 










sf00044.61 


At4G 19860 


Acyl acceptor Acyltransferase 










sf00044.90 


AtlG13210 


Translocase 










sf00044.113 


AtlG74320 


Choline Kinase 


[48] 



Note: ^Related genes are the genes containing or close to the screened markers. 

^Nearby lipid genes refer to the genes located in the upstream and downstream of 500 Kb far from the marl<er. 
^Only one of the homologous genes is listed in the table. 
— refers to no known lipid genes in the location. 
doi:l 0.1 371 /journal.pone.Ol 05757.t006 



Candidate genes and oil components 

To clarify the marJser location and more genome information 
for OC and PC traits, we screened the genes that were close to the 
associated markers using the sesame genome assembly data. As a 
result, 36 candidate genes related to lipid pathway were identified 
(Table 6) [25-48]. We found that most candidate genes were 
involved in three pathways: (1) fatty acid and TAG (triacylglycerol) 
synthesis and elongation, e.g., CO 1.526, CO 1.548 and CO 1.928; (2) 
TAG degradation, e.g., CO 1.601; and (3) fatty acid dehydrogena- 
tion, e.g., C04.786 (Stearoyl-ACP desaturase, determining the 
ratio of saturated and unsaturated fatty acids) (Table S4). 
Therefore, the seed oil content in the sesame accessions could 
be regulated by three factors, i.e., oil synthesis ability, oil 
degradation ability and oil component ratio (e.g., 18:1 and 18:2 
fatty acids). In various accessions, any alleles of the genes involved 
in fatty acid and TAG synthesis, TAG degradation or dehydro- 
genase genes could give effects on oil content. To confirm this 
hypothesis, further studies of seed oil synthesis should be 
performed in the future. 

Conclusion 

We systematically explored the association mapping of seed oil 
and protein content traits in 369 worldwide sesame accessions 
using the 1 1 2 SSR markers. A significant negative correlationof 
the oil content with the protein content existed in the population. 
19 SSR markers were associated with the oil content trait with 
high phenotypic variation explanation, and 24 SSR markers were 
associated with the protein content trait using GLM method. This 
association results would provide an efficient platform for seed 
development research and MAS breeding in sesame. 

Methods 

Plant materials 

A population of 369 core sesame germplasm resources was 
chosen according to the genetic diversity analysis results and 
phenotypic variation [51]. These core genotypes comprised 318 
lines from 1 9 provinces of China and 5 1 worldwide lines from the 
1 5 countries, which were reserved at the Sesame Germplasm Bank 
of Henan Sesame Research Center (HSRC), HAAS (Henan 
Academy of Agricultural Sciences) (Table SI). All accessions were 
grown with three replications at three difiFerent experimental 
stations of Yuanyang (113.96°E, 35.05°N), Pingyu (114.62°E, 
32.97°N), and Xinyang (114.08°E, 32.13°N) in 2011 and 2012. 
Five or six young leaves of individual accessions were collected and 
reserved at — 70°C for DNA extraction. 



Oil and protein content analysis 

After harvested, ~20 g of seeds were collected from five plants 
per line, and the seed OC and PC were measured on infrared 
determination equipment (Perten DA7250, Sweden) according to 
the manufactures' instructions. The standard curve for measuring 
the sesame oil and protein contents had been established 
according to the chemical analysis results of 300 sesame accessions 
(Unpublished data, HSRC). Three replications of each samples 
were assayed for phenotypic analysis. Mean value, broad-sense 
heritabUity and correlation coefficient of the phenotypic data were 
analyzed using the statistical analysis system software (SAS 
histitute Inc. 2002) [72]. 

SSR genotyping 

The 112 polymorphic SSR pairs were selected from our SSR 
marker bank [24,51,73] (Table S2). DNA extraction, PGR 
amplification, electrophoresis and SSR genotyping analysis were 
performed according to the methods described by Zhang et al. 
[24]. The total number of polymorphic alleles at each SSR locus 
was calculated according to the results of aU 369 lines. The 
polymorphic SSR alleles presented only within 4 (1%) or fewer 
accessions were recorded as rare alleles. 

Linkage disequilibrium (LD) 

As the population structure could result in the spurious 
associations between phenotypes and marker loci, we analyzed 
the extent and structure of LD within the population before 
selecting the appropriate association mapping strategy. To assay 
whether the 112 polymorphic SSR markers were segregated 
independently or not, LD analysis was conducted according to the 
dedicated procedure of the TASSEL software [49] . Both Z)' and r^ 
were used for quantifying LD values [74,75]. Significance {P 
values) of D' for each SSR pairs was determined with 100,000 
permutations. 

Population structure and relatedness analysis 

The population structure was determined using STRUCTURE 
2.2 [76]. The mixture model and the independent allele frequency 
model were used to analyze the population dataset. Five runs of 
STRUCTURE were carried out for each number of populations 
[K) (from 1-10), and each run started with 10,000 burn-ins, 
followed by 100,000 iterations. While performing the STRUC- 
TURE, we assumed that the inferred population accord with 
Hardy Weinberg equilibrium (HWE) and the loci are unlinked. To 
correct the relatedness of individuals in further analyses, the 
relatedness between individuals (relative kinship) was evaluated 
using SPAGeDi 1.2 software [77]. The matrix with the relative 
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kinship coefficients [K matrix) was applied for association analysis 
using the Mixed Linear Model(MLM, Q_+K)method. 

Association mapping and marker distribution in sesame 
genome 

Associations between the SSR markers and the oil and protein 
content traits were investigated using both methods of the general 
linear model (GLM, Q) and the mixed linear model (MLM, Q+K) 
in TASSEL 2.1 described by Bradbury et al. [49] . The mc-an value 
of the markers at P<0.005 was used for determining the 
significance of marker- trait associations. 

To determine distributions of the associated markers in sesame 
genome, we carried out the alignment of SSR markers and 
transcripts with the updated sesame genome data [70,71]. In the 
present genome assembly, the number of N50 scaffold was 14, and 
the number of N90 scaffold was 64. 29,798 gene models were 
identified. Among related scaffolds or contigs, the sesame lipid 
synthesis related genes were identified according to the homolo- 
gous comparison using the genes of A. thaliana from the Acyl 
Lipids pathway database [78] as queries. 
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